Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games
نویسندگان
چکیده
Multiagent learning is a key problem in AI. In the presence of multiple Nash equilibria, even agents with non-conflicting interests may not be able to learn an optimal coordination policy. The problem is exaccerbated if the agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two interrelated problems: identifying the game and learning to play. In this paper, we present optimal adaptive learning, the first algorithm that converges to an optimal Nash equilibrium with probability 1 in any team Markov game. We provide a convergence proof, and show that the algorithm’s parameters are easy to set to meet the convergence conditions.
منابع مشابه
Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games
Reinforcement learning turned out a technique that allowed robots to ride a bicycle, computers to play backgammon on the level of human world masters and solve such complicated tasks of high dimensionality as elevator dispatching. Can it come to rescue in the next generation of challenging problems like playing football or bidding on virtual markets? Reinforcement learning that provides a way o...
متن کاملRational and Convergent Model-Free Adaptive Learning for Team Markov Games1
In this paper, we address multi-agent decision problems where all agents share a common goal. This class of problems is suitably modeled using finite-state Markov games with identical interests. We tackle the problem of coordination and contribute a new algorithm, coordinated Qlearning (CQL). CQL combines Q-learning with biased adaptive play, a coordination mechanism based on the principle of f...
متن کاملMultiagent Reinforcement Learning in Stochastic Games
We adopt stochastic games as a general framework for dynamic noncooperative systems. This framework provides a way of describing the dynamic interactions of agents in terms of individuals' Markov decision processes. By studying this framework, we go beyond the common practice in the study of learning in games, which primarily focus on repeated games or extensive-form games. For stochastic games...
متن کاملSimple reinforcement learning agents: Pareto beats Nash in an algorithmic game theory study
Repeated play in games by simple adaptive agents is investigated. The agents use Q-learning, a special form of reinforcement learning, to direct learning of behavioral strategies in a number of 2! 2 games. The agents are able effectively to maximize the total wealth extracted. This often leads to Pareto optimal outcomes. When the rewards signals are sufficiently clear, Pareto optimal outcomes w...
متن کامل